Underutilized functions for data exploration

Tips from exploring hundreds of variables

Eric Leung

2018-06-02

Given data…

Orange data set on growth of orange trees

The Orange data frame has 35 rows and 3 columns of records of the growth of orange trees.

Typical exploratory functions

dim()

str()

Review functions: dim()

dim(Orange)
#> [1] 35  3

Review functions: head()

head(Orange)

Review functions: str()

str(Orange)

Frustrations and laziness

Three fruitful functions

Skim through your data quickly

library(skimr)
skim(Orange)

Keep grounded with the basics

stem(Orange$age)

Better description of your data

library(Hmisc)
describe(Orange)

Summary

Hmisc::describe()

base::stem()

skimr::skim()